Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to identify out of memory errors and print corresponding error message #58

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

pagmatt
Copy link
Member

@pagmatt pagmatt commented May 25, 2022

This PR aims to identify simulation crashes due to out of memory errors. Indeed, in these cases the simulation script is terminated by the kernel with signal SIGKILL (on POSIX systems), and no errors are shown in either stdout or stderr. Instead, with these changes the user will be informed of the likely cause of the crash.

P.S. I could not find a definition of the SIGKILL return code in any Python module, so I not-so-elegantly defined it myself. The syntax is taken from this PEP "guideline". Feel free to propose a different approach if a more elegant solutions comes to mind :)

@pagmatt pagmatt requested a review from DvdMgr May 25, 2022 11:31
Copy link
Member

@DvdMgr DvdMgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Once we address these it should be good to go!

sem/runner.py Outdated
Comment on lines 347 to 356
print(error_message)
print(error_message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing whitespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry about this

sem/runner.py Outdated
stderr_file.read(),
stdout_file.read()))
if return_code == SIGKILL_CODE:
error_message = '\nSimulation likely killed due to an out of memory error.\n' + \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not sound too confident about why the process was killed! How about:

"Simulation was killed. Possible causes may include an out of memory error."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitively better, you are right

Comment on lines +10 to +12
from typing import Final

SIGKILL_CODE: Final = -9 # POSIX return code which usually corresponds to out of memory events.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to import typing.Final here? We don't use typing anywhere else in the project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary, it just seemed "good practice" to use it for marking the variable (almost) as a constant. At least, that was my understanding from the PEP guideline which I linked above. If you prefer to limit the imports though I can remove it!

% (parameter,
stderr_file.read(),
stdout_file.read()))
if return_code == SIGKILL_CODE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth it to always print return_code when it's not 0, even when it's not exactly -9. Can you add this information to the common_error_message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@pagmatt pagmatt requested a review from DvdMgr May 30, 2022 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants